Goto

Collaborating Authors

 link function


Kernel Single-Index Bandits: Estimation, Inference, and Learning

Arya, Sakshi, Bhattacharjee, Satarupa, Sriperumbudur, Bharath K.

arXiv.org Machine Learning

We study contextual bandits with finitely many actions in which the reward of each arm follows a single-index model with an arm-specific index parameter and an unknown nonparametric link function. We consider a regime in which arms correspond to stable decision options and covariates evolve adaptively under the bandit policy. This setting creates significant statistical challenges: the sampling distribution depends on the allocation rule, observations are dependent over time, and inverse-propensity weighting induces variance inflation. We propose a kernelized $\varepsilon$-greedy algorithm that combines Stein-based estimation of the index parameters with inverse-propensity-weighted kernel ridge regression for the reward functions. This approach enables flexible semiparametric learning while retaining interpretability. Our analysis develops new tools for inference with adaptively collected data. We establish asymptotic normality for the single-index estimator under adaptive sampling, yielding valid confidence regions, and derive a directional functional central limit theorem for the RKHS estimator, which provides asymptotically valid pointwise confidence intervals. The analysis relies on concentration bounds for inverse-weighted Gram matrices together with martingale central limit theorems. We further obtain finite-time regret guarantees, including $\tilde{O}(\sqrt{T})$ rates under common-link Lipschitz conditions, showing that semiparametric structure can be exploited without sacrificing statistical efficiency. These results provide a unified framework for simultaneous learning and inference in single-index contextual bandits.





Statistical-ComputationalTradeoffs inHigh-DimensionalSingleIndex Models

Neural Information Processing Systems

We study the statistical-computational tradeoffs in a high dimensional single index modelY = f(X>β)+, where f is unknown,X is a Gaussian vector and β is s-sparse with unit norm. WhenCov(Y,X>β) 6= 0, [43] shows that the direction and support ofβ can be recovered using a generalized version of Lasso.


13d4635deccc230c944e4ff6e03404b5-AuthorFeedback.pdf

Neural Information Processing Systems

We appreciate the valuable comments from reviewers on paper presentation and typos. We will revise our work1 accordingly. Compared with these related work, we consider alarger model class.




2063a00c435aafbcc58c16ce1e522139-Paper-Conference.pdf

Neural Information Processing Systems

Amongst those functions, the simplest are single-index modelsf(x) = ϕ(x θ), where the labels are generated by an arbitrary non-linear scalar link functionϕ applied to an unknown one-dimensional projectionθ of the input data.